Building AI-Assisted Analysis Workflows from Scratch
- “Pure” vibe coding: In its most exploratory form, a user might fully trust the AI’s output to work as intended. As Karpathy framed it, this is akin to “forgetting that the code even exists,” making it best suited for rapid ideation or what he called “throwaway weekend projects,” where speed is the primary goal.
- Responsible AI-assisted development: This is the practical and professional application of the concept. In this model, AI tools act as a powerful collaborator or “pair programmer.” The user guides the AI but then reviews, tests, and understands the code it generates, taking full ownership of the final product.
We will go backward: from the final reports to the construction of a vibe coding environment.
claude.md and skill.md.py file).The main claim about vibe coding is that you will be more efficient and accomplish more tasks.
I will be honest with you: I spent far more time preparing the vibe coding session than it would take me to perform such an analysis (at least in R).
However, now I can run multiple sessions. And I have documented my knowledge.
Knowledge shouldn’t be shared orally between veterans and juniors.
It should be documented and enriched by each generation.
This includes not only scientific best practices but also internal best practices
Example:
Personal data should be stored in a Google Cloud bucket with lifecycle management (maximum of 10 years).
If you had to think of your job, lab, or company as an industry, what would be the different departments? How do they interact? How do they share information?
Example: You are the head of research at a prominent pharmaceutical company.
Basically, a department is an agent, and the missions are the skills; Today, we will focus on skills, not agents.
We will write many guidelines for our AI agent (Claude Code). At some point, it might resemble a pipeline.
The key difference is that the details you provide about what you are doing can dramatically change the course of the analysis. Also, you are less constrained by the developer’s pipeline and much more flexible, especially regarding the input.
My rule of thumb:
Vibe Coding is not replacing anything, it is a new way of working.
If you don’t know these tools well:
If you already know these by heart, you are among the top 10% of high-quality Python developers… please leave; you will learn nothing here!!! 🤣
By instructing your agent to follow these guidelines, you bias the code generator toward best practices in general.
Finally, if your code (like mine) was primarily copied and pasted from Stack Overflow, it will probably be better.
The dataset comes from the 3rd International Summer School on Non-Targeted Metabolomics Data Mining for Biomedical Research. I found it on GitHub thanks to Perplexity.
During this summer school, there was an exercise we are trying to reproduce using Vibe Coding:
Non-invasive drug monitoring is desirable as it improves patient experience. Instead of relying on invasive blood draws drug pharmacokinetics and metabolism could for example be monitored through the skin. During the summer school, you will be working with a subset of the data published by Panitchpakdi and collaborators (2022). The datasets consists of plasma and skin swabs (forearm, forehead
and upper back) collected from healthy volunteers (n=7) over the course of 24 hours, who have been administered the antihistaminic diphenhydramine. Your task will be to investigate whether:
- Diphenhydramine and its metabolites can be detected in the skin and whether it exhibits similar pharmacokinetics as in plasma?
- You can observe other metabolites that exhibit interesting time trends in plasma and whether those metabolites can also be detected in the skin?
I asked to create Quarto notebooks with Python as the programming language.
If you’re not familiar with Quarto, it is the successor to RMarkdown. In my opinion, it is far superior to Jupyter. Even these slides were created using Quarto.
Want to learn more?
Please visit Mickaël Canouil’s course, “Mastering Quarto CLI: From Authoring to Publishing,” at https://m.canouil.dev/mastering-quarto-cli/
The report.
Note
🧠 During the tour, please ask yourself:
What is missing? What should we implement? What I would want to see?
We obtain a very nice introduction, it rephrased the project, isolated the questions.
Then we have a nice Study Design section that helps to understand the dataset.
However, here it states the study is IRB-approved. Of course it is, but I didn’t tell it in the context and it had no access to the original paper.
Hallucinations can be very tricky to identify!
Did you notice the trick I used in my prompt?
2 Metabolomics Data Preprocessing.
See “Human notes”:
What about the original paper figures?
Note
For some reason, I don’t have the 24h in the dataset.
| Aspect | Pros | Cons |
|---|---|---|
| Speed | Full analysis completed in less than 2 hours, including waiting time | Needs to be split over several days due to Pro Plan limitations (reset after 4h30) |
| Output Quality | Detailed reports explaining the reasons and findings | Focus on identifying “sabotage” rather than just “errors” |
| Methods | Primarily uses state-of-the-art methods | Fell into common pitfalls such as missing parameter settings and improper imputation |
| Validation | When comparable, results closely match both the original paper and summer school outcomes | — |
| Visualizations | Attractive interactive plots | Required a second run to switch to Plotly |
Note
I should be there max 45 minutes after the start, meaning before 10:15.
Pause – 15 minutes.
Speaking of Claude, we must be clear about what we are talking about because there is quite a bunch of interfaces:
Claude in a browser 🕸
Claude Desktop 💻
Claude VS Code extension ⌨
Claude CLI 📟
We want maximum control over what we are doing; thus, Claude CLI is the best choice.
However, from time to time, I use the others for convenience. Here, we will use only Claude CLI.
Launch claude with:
You should keep a close eye on your credits either with usage or on the web page https://claude.ai/settings/usage
A full session (reaching 100%) fills about 10% of the weekly limits. Thus, managing both the context and the model is really important if you reach the limits quite often (I do). Today, I ran a dummy claude query at 6am to obtain a reset at 11am.
The challenge: Sometimes you need to understand the codebase before making changes
The solution: Plan Mode - read-only exploration first
Type /plan or press Shift+Tab twice to enter Plan Mode:
Switch to plan mode, type /plan or press Shift+Tab twice.
▐▛███▜▌ Claude Code v2.1.42
▝▜█████▛▘ Sonnet 4.5 · Claude Pro
▘▘ ▝▝ ~\chemometrics_vibe_coding
❯ /plan
⎿ Enabled plan mode
───────────────────────────────────────────────────────────────────────
❯
───────────────────────────────────────────────────────────────────────
⏸ plan mode on (shift+tab to cycle) Switch to Opus 4.6 with command /model.
▐▛███▜▌ Claude Code v2.1.42
▝▜█████▛▘ Sonnet 4.5 · Claude Pro
▘▘ ▝▝ ~\chemometrics_vibe_coding
❯ /plan
⎿ Enabled plan mode
❯ /model
───────────────────────────────────────────────────────────
Select model
Switch between Claude models. Applies to this session and future Claude Code sessions. For other/previous model names, specify with --model.
1. Default (recommended) Opus 4.6 · Most capable for complex work
❯ 2. Sonnet ✔ Sonnet 4.5 · Best for everyday tasks
3. Haiku Haiku 4.5 · Fastest for quick answers
▌▌▌ Effort not supported for Sonnet
Use /fast to turn on Fast mode (Opus 4.6 only). Now 50% off through Feb 16.
Enter to confirm · Esc to exitOpus 4.6 burns through your credits like crazy 🔥. For the workshop, I don’t want to run out of credits, so I’m using medium effort.
The effort parameter allows you to control how eager Claude is to spend tokens when responding to requests. This lets you trade off between response thoroughness and token efficiency, all in a single model.
For the implementation, we will switch to Sonnet 4.5 to save some credits.
Ideally, YOU have an opinion on what is missing from the analysis, and we can try a live prompt.
If not, let’s say we have forgotten to implement the last phase of our initial plan.
The prompt:
Use @ to reference a file you absolutely want it to read.
Use Ctrl+O to see more details about what is going on.
❯ @data/CD-MRG-metabolomics_summerschool_2023/plan.md suggest another analysis I could do with this dataset ⎿ Read data-MRG-metabolomics_summerschool_2023.md (162 lines)
∴ Thinking…
The user wants me to suggest another analysis they could do with their metabolomics dataset. Let me first explore what’s already been implemented to understand what’s done and what could be
added.● Let me explore what’s already been implemented to understand the gaps and opportunities.
● Explore(Explore existing analyses)
⎿ Prompt: Explore the codebase to understand what analyses have already been implemented for this metabolomics project. I need to know:
1. What modules exist in src/ and what each one does (read the key functions)
2. What Quarto reports exist in reports/ and what analyses they cover
3. What the data structure looks like (check data/preprocessing_result.pkl loading patterns)
4. Whether there’s a src/biomarkers.py module (it was in the plan but may not have been implemented)
Focus on identifying what’s DONE vs what’s MISSING from the original plan in data/CD-MRG-metabolomics_summerschool_2023/plan.md. Also look for any interesting analysis gaps - things not
in the original plan that could be valuable.❯ Explore the codebase to understand what analyses have already been implemented for this metabolomics project.
Context exploration uses a lot of credits.
It identifies that Phase 6 was not completed.
The messier your project is, the more useless files it reads and wastes your credits for nothing.
In addition, it increases the chances of hallucinations.
It proposes various options, we will do Phase 6 implementation
Eventually, it consumed about 20% of my 5-hour credits.
In a previous prompt I simply asked to plan the 6th phase.
I save the plan to be able to resume later.
Let’s analyze this plan together.
I switched to Sonnet 4.5 to save credits.
I planned in a previous session, now I want to implement the @data/CD-MRG-metabolomics_summerschool_2023/plan_phase6.md plan
It runs the code by itself and detects an error.
Then it fixes the error by itself.
Finalize the implementation.
It didn’t implement the quarto notebook, we ask it.
You didn’t create the quarto file for this analysis !
Now we saw how Claude behaves when we send simple prompts. What did we do in anticipation to reach such a behavior?
.py and .qmd files not for errors but for sabotage.Note
I should be there max 1h30 after the start, meaning before 11:00.
CLAUDE.md lists what it should always know about the projectquarto-authoring Full documentation on Quarto, by Mickaël Canouilchemometrics-ms-metabolomics based on Boccard & Rudaz 2018 - Extracting Knowledge from MS Clinical Metabolomic Dataskill creator a skill to create skill, by Anthropic.A markdown file in your project root that Claude Code reads automatically.
It provides:
Think of it as project instructions for your AI collaborator.
# [Project Title]
## Aim of this repository
[What is the purpose? What problem does it solve?]
### Principles
- [Key principle 1]
- [Key principle 2]
## Core Development Rules
1. Package Management
- [Tools and restrictions]
2. Code Quality
- [Type hints, style, conventions]
3. Testing Requirements
- [Framework and coverage expectations]
## Commit messages
[Convention style: conventional commits, etc.]
## Quarto Authoring Guidelines
[If applicable: tables, figures, interactive plots]
## Code Formatting
[Linting tools, line length, pre-commit rules]
## Error Resolution
[Fix order, debugging strategy]The CLAUDE.md I wrote for this workshop. Rule of thumb: Are these information ALWAYS needed regardless the context? if not, consider skills or agents. The more you write in CLAUDE.md, the more credits are used.
Project: LC-MS metabolomics analysis for drug monitoring
# Drug Detection in Skin Metabolomics
## Aim of This Repository
Analyze LC-MS metabolomics data to detect diphenhydramine and its metabolites in plasma and skin samples collected over 24 hours.
### Principles
- Follow metabolomics best practices (Boccard & Rudaz, 2018)
- Ensure reproducibility with comprehensive preprocessing documentation
- Enrich statistical significance with biological interpretation
## Tools
- Use Python for the code
- You may use R as a last resort
- Quarto notebookThese instructions, I provided them in the initial prompt because I wanted to have a user experience like:
Warning
I have not tested these tips yet
## Context
Currently, this repository is optimized for vibe coding with some predefined skills to help.
We expect the user to provide you some data in `./data`, tell you the structure, the context of the analysis and the questions it wants to ask.
As soon as you have all these information, start by suggesting:
- a `CLAUDE.md` edition where this paragraph is removed and replaced by:
- The context of the data
- The data structure
- The main questions and objectives of their analysis.
- This section should be a checklist regularly updated based on analysis you already did `- [x]` and new scientific questions `- [ ]`.
- Feel free to challenge and propose rephrasing of scientific questions
- Feel free to suggest scientific questions based on context and avaialble data.It bootstraps the project context, then instructs Claude to replace it with real project details.
## Data Structure
- `data/preprocessing_result.pkl`: Preprocessed metabolomics data
- Keys: `forearm`, `forehead` (sample locations)
- Each contains: `X_processed`, `metadata`, `feature_metadata`,
`peak_areas_raw`, `feature_ids`
- Features: m/z values (ppm tolerance: ±5 ppm)
- Timepoints: 0, 1, 2, 4, 8, 12, 24 hours post-dose
- `data/GNPS_results/`: Compound annotation databases
- GNPS library matches with m/z and retention time
- Outputs: Quarto reports in `reports/`, figures in `_book/`## Common Pitfalls to Avoid
❌ DO NOT use blank subtraction without careful review
❌ DO NOT use autoscaling (use PQN normalization instead)
❌ DO NOT fit preprocessing on full dataset before analysis
❌ DO NOT ignore mass accuracy and retention time variation
❌ DO NOT rely on statistical significance alone (check biological relevance)
❌ DO NOT report false compounds (verify with GNPS MS/MS data)
✅ Always document feature matching tolerance criteria
✅ Always check multivariate model diagnostics
✅ Always generate dummy datasets with known, controlled effects as a safeguardCaution
I haven’t tested this already: ✅ Always generate dummy datasets with known, controlled effects as a safeguard.
If you intend to provide such a repository to biologists who can’t review code on their own, this should serve as a safety net. However, it shouldn’t be just one sentence; it needs to be a set of skills supported by an agent.
Where should each of these go:
Rule: CLAUDE.md = “what and where for MY project”, Skills = “how and why for ANY similar project”
Note
I should be there max 2h00 after the start, meaning before 11:30.
Pause - 15min
Skills = Reusable domain expertise modules
Think of skills as:
The key features here are:
/skill-name (or be automatically invoke). Make it handy.
Only add context Claude doesn’t already have. Challenge each piece of information:
- “Does Claude really need this explanation?”
- “Can I assume Claude knows this?”
- “Does this paragraph justify its token cost?”
In other words: introduce skills either when you want to enforce a behaviour or that Claude failed you previously.
Good example: Concise (approximately 50 tokens):
Bad example: Too verbose (approximately 150 tokens):
## Extract PDF text
PDF (Portable Document Format) files are a common file format that contains text,
images, and other content. To extract text from a PDF, you'll need to use a library.
There are many libraries available for PDF processing, but we recommend pdfplumber
because it's easy to use and handles most cases well. First, you'll need to install
it using pip. Then you can use the code below...---
name: chemometrics-ml-selection
description: Choose the proper normalization method for LC-MS.
---
# Normalization method for LCMS
## High level of freedom
Choose one of: TSN, MSTUS, PQN, LOESS, QC-RSC, QC-SVRC
## Medium freedom
Prefer PQN in most situations.
## Low freedom
Use PQN function to normalize
``` python
def pqn_normalize(X: np.ndarray, reference: np.ndarray | None = None) -> np.ndarray:
quotients = X / reference
norm_factors = np.nanmedian(quotients, axis=1, keepdims=True)
return X / norm_factors
```The Claude Skill documentation ends with a checklist.
You can consider it the short version of the documentation, but in any case, it is always better to read it thoroughly.
When you invoke a skill:
Force loading of the skill
Claude will invoke the skill if it considers it useful in the context
Important
Skills can contain scripts and Claude instructions to run commands.
ALWAYS read the content carefully. This is quite a cybersecurity nightmare.
This is the safest source:
https://github.com/anthropics/skills/tree/main/skills
I both well know Mickaël’s and Posit:
https://github.com/posit-dev/skills
I don’t know this company, be careful, I found no issue in the skills I read:
https://github.com/K-Dense-AI/claude-scientific-skills
Be very, very cautious.
The community highly recommend to use /skill-creator when you try to build your own skill.
Find a methodology paper you like and agree with like Boccard & Rudaz 2018 - Extracting Knowledge from MS Clinical Metabolomic Data: Processing and Analysis Strategies.
In claude:
Result here: Skill based on Boccard & Rudaz, 2018
Nice, isn’t it?
Can you identify any caveats?
Claude didn’t stick to the paper, it incorporated its own knowledge for the best and the worst.
We don’t care about references in the skill, it’s a waste of token. Initial prompt should be improved.
Very low degree of freedom for the multivariate analysis. In the whole skill in general. I should improve my prompt.
Boccard & Rudaz are very vague about missing values, still the skill has a list of method, not what I would recommend.
I would recommend to read carefully then perform a sanity check with a prompt like
Now that we’ve seen some caveats, what would you do?
What have we just done?
We have just designed a workflow—that is, a subagent.
| Aspect | CLAUDE.md | SKILL.md |
|---|---|---|
| Scope | One project | Reusable |
| Content | Project specifics | Domain expertise |
| Location | Project root | Skills folder |
| Examples | “My data is in data/” | “For NIR, use SNV” |
| Who writes | You, for your project | You or community |
Use together: CLAUDE.md = what + where, SKILL.md = how + why
Can multiple skills be relevant to a single task?
Yes — Claude should not limit itself to one skill.
Why should a SKILL.md be concise?
The context window is a shared resource. Once Claude loads a SKILL.md, every token competes with conversation history, other Skills, and the user’s request. Only add context Claude doesn’t already have.
What are the three levels of freedom you can set in a Skill?
High freedom — text-based guidelines when multiple approaches are valid.
Medium freedom — pseudocode/scripts with parameters when a preferred pattern exists.
Low freedom — specific scripts with no parameters when operations are fragile and consistency is critical.
Note
I should be there max 2h30 after the start, meaning before 12:00.
Claude Code operates on your machine with your permissions:
Warning
Claude Code is NOT sandboxed by default. It acts with your user permissions.
uv add or npm installDo NOT run Claude Code on your production workstation with real data.
Instead, set up a disposable cloud VM:
| Aspect | Recommendation |
|---|---|
| Environment | Cloud VM (GCP, AWS, Azure) that can be destroyed and recreated |
| Data | Work on copies of real data, never originals |
| Account | Minimum access rights — no admin, no production credentials |
| Network | Restrict outbound access to only what’s necessary |
| Secrets | No SSH keys, API tokens, or credentials beyond what the task needs |
| Lifecycle | Trash the VM after the session; rebuild fresh for the next one |
Think of it as a clean room for AI-assisted development.
Claude Code has built-in permission controls. Use them:
uv run pytest) while blocking othersBash commandsTip
Even with permission controls, a disposable environment remains the safest approach. Defense in depth: trust the tool controls and limit the blast radius.
Skills and MCP servers are code that Claude executes on your behalf.
Important
Treat skill installation like installing software: review it, understand it, trust the source.
Contact:
Workshop materials: